Not all tweets have geolocation information available. Accessing the Twitter API via the streamR package particular parameters can be set to only include those tweets with geolocation enabled. You can also specify a bounding box to further filter the geographic area from which you would like to draw your sample tweets. Now, the bounding box is going to be just that, a box. And we are dealing with polygons. To isolate tweets from a specific geo-political region, such as a census tract, you can use the sp package. For the details on how to both get geo-tagged tweets in R and how to clip the tweets to fit a specific spatial object polygon, refer to my previous post Access Twitter posts by country.
Picking up from last time
Our starting point, therefore, looks like this:
ls()
## [1] "p" "p.roadmap" "pima.tweets" "span.total"
## [5] "speakers.total"
p and p.roadmap are the plots and span.total and speakers.total are the variables I created in the last post based on the American Fact Finder data to visualize the level of Spanish speakers by census tract.


Here I will be working with a small set of tweets collected from Twitter and clipped to only include posts that emanated from within Pima county, Arizona –the county in which Tucson resides. So here’s a quick look at the variables in the data:
pima.tweets %>% names
[1] "lon" "lat"
[3] "text" "retweet_count"
[5] "favorited" "truncated"
[7] "id_str" "in_reply_to_screen_name"
[9] "source" "retweeted"
[11] "created_at" "in_reply_to_status_id_str"
[13] "in_reply_to_user_id_str" "lang"
[15] "listed_count" "verified"
[17] "location" "user_id_str"
[19] "description" "geo_enabled"
[21] "user_created_at" "statuses_count"
[23] "followers_count" "favourites_count"
[25] "protected" "user_url"
[27] "name" "time_zone"
[29] "user_lang" "utc_offset"
[31] "friends_count" "screen_name"
[33] "country_code" "country"
[35] "place_type" "full_name"
[37] "place_name" "place_id"
[39] "place_lat" "place_lon"
[41] "expanded_url" "url"
There is plenty of interesting information you can play around with –but note, fields with user input often contain unreliable information. In this post I’ll only need a few key features (lon, lat, and text) and include one other (lang) which facilitates my aim to explore the relationship between language choice on Twitter and US Census demographic information.
pima.tweets <- subset(pima.tweets,
select = c("lon", "lat", "text", "lang"))
To include points on our map corresponding to Twitter posts we use the geom_point function including specifying the pima.tweets dataset.
p + geom_point(data = pima.tweets,
aes(x = lon, y = lat, group = 1))

There are various aesthetics that ggplot2 makes available that we can use to visualize language (lang). In this case I don’t want to see languages other than English and Spanish so I will subset the data using en and es and map it to the color aesthetic. Note that I’m naively trusting the language detection algorithm that Twitter uses.
pima.tweets <- subset(pima.tweets, lang == 'en' | lang == 'es')
p + geom_point(data = pima.tweets,
aes(x = lon, y = lat, group = 1,
color = lang)) +
scale_color_manual(values = c("yellow","red"), name = "Language")

If you’re me, you’re thinking it would be cool to see what the content of these tweets are. The plotly package can be hooked up with ggplot2 and you can get a really cool effect in which the text appears on hovering over a point on the map.
Just load the plotly library, create your standard plot, and then apply the ggplotly() function.
library(plotly)
pp <- p + geom_point(data = pima.tweets,
aes(x = lon, y = lat, group = 1,
color = lang, text = text)) +
scale_color_manual(values = c("yellow","red"), name = "Language")
ggplotly(pp)